387 research outputs found

    The psychological reality of rhythm classes: Perceptual studies

    Get PDF
    Linguists have traditionally classified languages into three rhythm classes, namely stress-timed, syllable-timed and mora-timed languages. However, this classification has remained controversial for various reasons: the search for reliable acoustic cues to the different rhythm types has long remained elusive; some languages are claimed to belong to none of the three classes; and few perceptual studies has bolstered the notion. We have previously proposed an acoustic/phonetic model of the different types of linguistic rhythm, and of their categorisation as such by listeners. Here, we present perceptual experiments that directly test the notion of rhythm classes, our model's predictions, and the question of intermediate languages. Language discrimination experiments were run using a speech resynthesis technique to ensure that only rhythmic cues were available to the subjects. Languages investigated were English, Dutch, Spanish, Catalan and Polish. Our results are consistent with the idea that English and Dutch are stress-timed, Spanish and Catalan are syllable-timed, but Polish seems to be different from any other language studied and thus may constitute a new rhythm class. We propose that perceptual studies tapping the ability to discriminate languages' rhythm are the proper way to generate more empirical data relevant to rhythm typology

    A Temporal Coherence Loss Function for Learning Unsupervised Acoustic Embeddings

    Get PDF
    AbstractWe train neural networks of varying depth with a loss function which imposes the output representations to have a temporal profile which looks like that of phonemes. We show that a simple loss function which maximizes the dissimilarity between near frames and long distance frames helps to construct a speech embedding that improves phoneme discriminability, both within and across speakers, even though the loss function only uses within speaker information. However, with too deep an architecture, this loss function yields overfitting, suggesting the need for more data and/or regularization

    Learning weakly supervised multimodal phoneme embeddings

    Full text link
    Recent works have explored deep architectures for learning multimodal speech representation (e.g. audio and images, articulation and audio) in a supervised way. Here we investigate the role of combining different speech modalities, i.e. audio and visual information representing the lips movements, in a weakly supervised way using Siamese networks and lexical same-different side information. In particular, we ask whether one modality can benefit from the other to provide a richer representation for phone recognition in a weakly supervised setting. We introduce mono-task and multi-task methods for merging speech and visual modalities for phone recognition. The mono-task learning consists in applying a Siamese network on the concatenation of the two modalities, while the multi-task learning receives several different combinations of modalities at train time. We show that multi-task learning enhances discriminability for visual and multimodal inputs while minimally impacting auditory inputs. Furthermore, we present a qualitative analysis of the obtained phone embeddings, and show that cross-modal visual input can improve the discriminability of phonological features which are visually discernable (rounding, open/close, labial place of articulation), resulting in representations that are closer to abstract linguistic features than those based on audio only

    Occlusion resistant learning of intuitive physics from videos

    Get PDF
    To reach human performance on complex tasks, a key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences. Yet, most of these methods are restricted to the case where no, or only limited, occlusions occur. In this work we propose a probabilistic formulation of learning intuitive physics in 3D scenes with significant inter-object occlusions. In our formulation, object positions are modeled as latent variables enabling the reconstruction of the scene. We then propose a series of approximations that make this problem tractable. Object proposals are linked across frames using a combination of a recurrent interaction network, modeling the physics in object space, and a compositional renderer, modeling the way in which objects project onto pixel space. We demonstrate significant improvements over state-of-the-art in the intuitive physics benchmark of IntPhys. We apply our method to a second dataset with increasing levels of occlusions, showing it realistically predicts segmentation masks up to 30 frames in the future. Finally, we also show results on predicting motion of objects in real videos

    Developmental Psychology: A Precursor of Moral Judgment in Human Infants?

    Get PDF
    Human infants evaluate social interactions well before they can speak, and show a preference for characters that help others over characters that are not cooperative or are hindering

    Phoneme learning is influenced by the taxonomic organization of the semantic referents

    Get PDF
    International audienceWord learning relies on the ability to master the sound contrasts that are phonemic (i.e., signal meaning difference) in a given language. Though the timeline of phoneme development has been studied extensively over the past few decades, the mechanism of this development is poorly understood. Previous work has shown that human learners rely on referential information to differentiate similar sounds, but largely ignored the problem of taxonomic ambiguity at the semantic level (two different objects may be described by one or two words depending on how abstract the meaning intended by the speaker is). In this study, we varied the taxonomic distance of pairs of objects and tested how adult learners judged the phonemic status of the sound contrast associated with each of these pairs. We found that judgments were sensitive to gradients in the taxonomic structure, suggesting that learners use probabilistic information at the semantic level to optimize the accuracy of their judgements at the phonological level. The findings provide evidence for an interaction between phonological learning and meaning generalization, raising important questions about how these two important processes of language acquisition are related

    Epenthetic vowels in Japanese: A perceptual illusion?

    Get PDF
    In four cross-linguistic experiments comparing French and Japanese hearers, we found that the phonotactic properties of Japanese (very reduced set of syllable types) induce Japanese listeners to perceive ``illusory'' vowels inside consonant clusters in VCCV stimuli. In Experiments 1 and 2, we used a continuum of stimuli ranging from no vowel (e.g. ebzo) to a full vowel between the consonants (e.g. ebuzo). Japanese, but not French participants, reported the presence of a vowel [u] between consonants, even in stimuli with no vowel. A speeded ABX discrimination paradigm was used in Experiments 3 and 4, and revealed that Japanese participants had trouble discriminating between VCCV and VCuCV stimuli. French participants, in contrast had problems discriminating items that differ in vowel length (ebuzo vs. ebuuzo), a distinctive contrast in Japanese but not in French. We conclude that models of speech perception have to be revised to account for phonotactically-based assimilations

    On Doing Things Intentionally

    Get PDF
    Recent empirical and conceptual research has shown that moral considerations have an influence on the way we use the adverb 'intentionally'. Here we propose our own account of these phenomena, according to which they arise from the fact that the adverb 'intentionally' has three different meanings that are differently selected by contextual factors, including normative expectations. We argue that our hypotheses can account for most available data and present some new results that support this. We end by discussing the implications of our account for folk psychology

    Le langage et son acquisition : bases biologiques et psychologiques

    Get PDF
    Emmanuel Dupoux, directeur d’études 1. Bases psychologiques des jugements moraux L’étude des bases psychologiques et neurobiologiques du jugement moral a pris un nouveau départ depuis les cinq dernières années. Ce renouveau d’intérêt est du à la confluence de trois courants de recherche : 1) l’étude des émotions, 2) les neurosciences de la cognition sociale, 3) l’analogie entre intuitions grammaticales et intuitions morales. Concernant le premier courant, Haidt (2001) a découvert que certains..
    • …
    corecore